Function Prediction of Proteins from their Sequences with BAR 3.0
Main Article Content
Abstract
Protein functional annotation requires time and effort, while sequencing technologies are fast and cheap. For this reason, the development of software tools aimed at predicting protein function from sequences can help in protein annotation.
In this paper, we describe how to use our recently implemented Bologna Annotation Resource (BAR) version 3.0, a tool based on over 30 million protein sequences for protein structural and functional annotation. In BAR 3.0, sequences are arranged in a similarity graph and then clustered together when they share at least 40% sequence identity over 90% of sequence alignment, for a total of 1,361,773 clusters.
Protein sequences with known function transfer their annotation to other sequences in the same cluster after statistical validation. Sequences with unknown function and new sequences entering in a cluster inherit its statistically validated annotations.
The method well compares to other techniques in the Critical Assessment of protein Function Annotation algorithms (CAFA). The CAFA experiment tests the performances of different predictors on a dataset that accumulates annotations over time. BAR predictions have been submitted to all the instances of CAFA through the years (BAR Plus in CAFA, BAR++ in CAFA2 and BAR 3.0 in CAFA3). The benchmarking indicates that in the field improvement is still possible and that our BAR scores among the top performing methods.
This work focuses on how the tool can transfer statistically significant features to poorly annotated or new sequences derived from transcrptomics or proteomics experiments.
Article Details
Copyright (c) 2017 Profiti G, et al.

This work is licensed under a Creative Commons Attribution 4.0 International License.
UniProt Consortium. UniProt: A hub for protein information. Nucleic Acids Res. 2015; 43: 204-212. Ref.: https://goo.gl/YrmgUA
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, et al. A large-scale evaluation of computational protein function prediction. Nat Meth. 2013; 10: 221-227. Ref.: https://goo.gl/Xg6dfK
Jiang Y, Oron RT, Clark TW, Bankapur RA, D’Andrea D, et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biology. 2016; 17: 184. Ref.: https://goo.gl/LQhGpN
Bartoli L, Montanucci L, Fronza R, Martelli PL, Fariselli P, et al. The Bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis. J Proteome Res. 2009; 8: 4362-4371. Ref.: https://goo.gl/DLrVmk
Piovesan D, Martelli PL, Fariselli P, Zauli A, Rossi I, et al. BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Res. 2011; 39: 197-202. Ref.: https://goo.gl/9it5MU
Piovesan D, Martelli PL, Fariselli P, Profiti G, Zauli A, et al. How to inherit statistically validated annotation within BAR+ protein clusters. BMC Bioinformatics. 2013; 3: 4. Ref.: https://goo.gl/ZM9Buz
Profiti G, Martelli PL, Casadio R. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation. Nucl Acids Res. 2017. Ref.: https://goo.gl/gvWSiw
Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015; 43: 1049-1056. Ref.: https://goo.gl/kW74s7
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016; 44: 279-285. Ref.: https://goo.gl/AVdLFi
Rose PW, Prlic A, Bi C, Bluhm WF, Christie CH, et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2015; 43: 345-356. Ref.: https://goo.gl/Az5RMF
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017; 45: 353-361. Ref.: https://goo.gl/zQm1iq
Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, et al. The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases. Nucl Acids Res. 2014; 42: 358-363. Ref.: https://goo.gl/gWJfTW