Abstract

Short Communication

Function Prediction of Proteins from their Sequences with BAR 3.0

Giuseppe Profiti, Pier Luigi Martelli and Rita Casadio*

Published: 23 June, 2017 | Volume 1 - Issue 1 | Pages: 001-005

Protein functional annotation requires time and effort, while sequencing technologies are fast and cheap. For this reason, the development of software tools aimed at predicting protein function from sequences can help in protein annotation.

In this paper, we describe how to use our recently implemented Bologna Annotation Resource (BAR) version 3.0, a tool based on over 30 million protein sequences for protein structural and functional annotation. In BAR 3.0, sequences are arranged in a similarity graph and then clustered together when they share at least 40% sequence identity over 90% of sequence alignment, for a total of 1,361,773 clusters.

Protein sequences with known function transfer their annotation to other sequences in the same cluster after statistical validation. Sequences with unknown function and new sequences entering in a cluster inherit its statistically validated annotations.

The method well compares to other techniques in the Critical Assessment of protein Function Annotation algorithms (CAFA). The CAFA experiment tests the performances of different predictors on a dataset that accumulates annotations over time. BAR predictions have been submitted to all the instances of CAFA through the years (BAR Plus in CAFA, BAR++ in CAFA2 and BAR 3.0 in CAFA3). The benchmarking indicates that in the field improvement is still possible and that our BAR scores among the top performing methods.

This work focuses on how the tool can transfer statistically significant features to poorly annotated or new sequences derived from transcrptomics or proteomics experiments. 

Read Full Article HTML DOI: 10.29328/journal.hpbr.1001001 Cite this Article Read Full Article PDF

References

  1. UniProt Consortium. UniProt: A hub for protein information. Nucleic Acids Res. 2015; 43: 204-212. Ref.: https://goo.gl/YrmgUA
  2. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, et al. A large-scale evaluation of computational protein function prediction. Nat Meth. 2013; 10: 221-227. Ref.: https://goo.gl/Xg6dfK
  3. Jiang Y, Oron RT, Clark TW, Bankapur RA, D’Andrea D, et al. An expanded evaluation of protein function prediction methods shows an improvement in accuracy. Genome Biology. 2016; 17: 184. Ref.: https://goo.gl/LQhGpN
  4. Bartoli L, Montanucci L, Fronza R, Martelli PL, Fariselli P, et al. The Bologna annotation resource: a non hierarchical method for the functional and structural annotation of protein sequences relying on a comparative large-scale genome analysis. J Proteome Res. 2009; 8: 4362-4371. Ref.: https://goo.gl/DLrVmk
  5. Piovesan D, Martelli PL, Fariselli P, Zauli A, Rossi I, et al. BAR-PLUS: the Bologna Annotation Resource Plus for functional and structural annotation of protein sequences. Nucleic Acids Res. 2011; 39: 197-202. Ref.: https://goo.gl/9it5MU
  6. Piovesan D, Martelli PL, Fariselli P, Profiti G, Zauli A, et al. How to inherit statistically validated annotation within BAR+ protein clusters. BMC Bioinformatics. 2013; 3: 4. Ref.: https://goo.gl/ZM9Buz
  7. Profiti G, Martelli PL, Casadio R. The Bologna Annotation Resource (BAR 3.0): improving protein functional annotation. Nucl Acids Res. 2017. Ref.: https://goo.gl/gvWSiw
  8. Gene Ontology Consortium. Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015; 43: 1049-1056. Ref.: https://goo.gl/kW74s7
  9. Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016; 44: 279-285. Ref.: https://goo.gl/AVdLFi
  10. Rose PW, Prlic A, Bi C, Bluhm WF, Christie CH, et al. The RCSB Protein Data Bank: views of structural biology for basic and applied research and education. Nucleic Acids Res. 2015; 43: 345-356. Ref.: https://goo.gl/Az5RMF
  11. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: New perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017; 45: 353-361. Ref.: https://goo.gl/zQm1iq
  12. Orchard S, Ammari M, Aranda B, Breuza L, Briganti L, et al. The MIntAct project-IntAct as a common curation platform for 11 molecular interaction databases. Nucl Acids Res. 2014; 42: 358-363. Ref.: https://goo.gl/gWJfTW

Figures:

Figure 1

Figure 1

Similar Articles

Recently Viewed

Read More

Most Viewed

Read More

Help ?