Function Prediction of Proteins from their Sequences with BAR 3.0

Main Article Content

Giuseppe Profiti
Pier Luigi Martelli
Rita Casadio

Abstract

Protein functional annotation requires time and effort, while sequencing technologies are fast and cheap. For this reason, the development of software tools aimed at predicting protein function from sequences can help in protein annotation.


In this paper, we describe how to use our recently implemented Bologna Annotation Resource (BAR) version 3.0, a tool based on over 30 million protein sequences for protein structural and functional annotation. In BAR 3.0, sequences are arranged in a similarity graph and then clustered together when they share at least 40% sequence identity over 90% of sequence alignment, for a total of 1,361,773 clusters.


Protein sequences with known function transfer their annotation to other sequences in the same cluster after statistical validation. Sequences with unknown function and new sequences entering in a cluster inherit its statistically validated annotations.


The method well compares to other techniques in the Critical Assessment of protein Function Annotation algorithms (CAFA). The CAFA experiment tests the performances of different predictors on a dataset that accumulates annotations over time. BAR predictions have been submitted to all the instances of CAFA through the years (BAR Plus in CAFA, BAR++ in CAFA2 and BAR 3.0 in CAFA3). The benchmarking indicates that in the field improvement is still possible and that our BAR scores among the top performing methods.


This work focuses on how the tool can transfer statistically significant features to poorly annotated or new sequences derived from transcrptomics or proteomics experiments.

Article Details

Profiti, G., Martelli, P. L., & Casadio, R. (2017). Function Prediction of Proteins from their Sequences with BAR 3.0. Annals of Proteomics and Bioinformatics, 1(1), 001–005. https://doi.org/10.29328/journal.apb.1001001
Short Communications

Copyright (c) 2017 Profiti G, et al.

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

UniProt Consortium. UniProt: A hub for protein information. Nucleic Acids Res. 2015; 43: 204-212. Ref.: https://goo.gl/YrmgUA

Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, et al. A large-scale evaluation of computational protein function prediction. Nat Meth. 2013; 10: 221-227. Ref.: https://goo.gl/Xg6dfK

Jiang Y, Oron RT, Clark TW, Bankapur RA, D’Andrea D, et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biology. 2016; 17: 184. Ref.: https://goo.gl/LQhGpN

Bartoli L, Montanucci L, Fronza R, Martelli PL, Fariselli P, et al. The Bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis. J Proteome Res. 2009; 8: 4362-4371. Ref.: https://goo.gl/DLrVmk

Piovesan D, Martelli PL, Fariselli P, Zauli A, Rossi I, et al. BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Res. 2011; 39: 197-202. Ref.: https://goo.gl/9it5MU

Piovesan D, Martelli PL, Fariselli P, Profiti G, Zauli A, et al. How to inherit statistically validated annotation within BAR+ protein clusters. BMC Bioinformatics. 2013; 3: 4. Ref.: https://goo.gl/ZM9Buz

Profiti G, Martelli PL, Casadio R. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation. Nucl Acids Res. 2017. Ref.: https://goo.gl/gvWSiw

Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015; 43: 1049-1056. Ref.: https://goo.gl/kW74s7

Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016; 44: 279-285. Ref.: https://goo.gl/AVdLFi

Rose PW, Prlic A, Bi C, Bluhm WF, Christie CH, et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2015; 43: 345-356. Ref.: https://goo.gl/Az5RMF

Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017; 45: 353-361. Ref.: https://goo.gl/zQm1iq

Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, et al. The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases. Nucl Acids Res. 2014; 42: 358-363. Ref.: https://goo.gl/gWJfTW