Research Details
 
Gene Function Prediction/ANNOTATOR Group
 
Our group is developing an advanced tool for functional characterization of sequences and strives to establish the ANNOTATOR software environment as the de-facto standard in this field. The scope of work includes the integration of established algorithms as well as research into novel heuristics for tracing distant evolutionary relationships. Due to the complex nature of such heuristics, it is necessary to additionally consider aspects of high performance and distributed computing.

The ever increasing amount of data flowing into biological databases shows no signs of leveling off. Sequencing technology is improving at an unprecedented rate, bringing down the time it takes to decipher entire genomes to a matter of days. Making sense of this data by predicting molecular function is a time-consuming and tedious manual task. The number of new sequence analytic methods constantly being added to the toolbox of the computational biologist requires knowledge about a vast array of different interfaces, execution parameters and input formats.

The ANNOTATOR, which was initially conceived by the Eisenhaber group at the Institute of Molecular Pathology in Vienna and is now developed and enhanced by our group at BII, provides an integrated environment for the analysis of sequences as well as other biologically relevant entities. Biological objects are represented in a unified data model and long-term persistence in a relational database is supplied by an object-relational mapping layer. Data to be analyzed can be provided in different formats ranging from web-based forms, FASTA formatted flat files to remote import over a SOAP interface.

A large number of external algorithms are plugged into the ANNOTATOR and can be used to analyze sequences. Applicable external algorithms are presented in a way that closely follows the standard procedure for segment based sequence analysis, which is based on the assumption that proteins are chains of functional units that can be analyzed independently with the overall function of the protein arising from the synthesis of the functions predicted for each individual module.

A sophisticated user-interface allows for the visualization of function predictions enabling researchers to gain an instant overview, as well as allowing to drill down into detailed raw data (see Illustration 1).
 


Illustration 1: Sequence Analysis of Dysferlin with highlighted C2 domain
 
Apart from integrating new external algorithms we also develop so-called "integrated algorithms" which allow us to model complex heuristics for the discovery of molecular functions. The availability of sequence data from a wide range of organisms makes evolutionary relationships more traceable since previously "missing links" can serve as bridges between hitherto unconnected parts of the sequence universe. However, no researcher can "build" these bridges manually. We have come up with a number of iterative heuristics for collecting families of proteins.

The fact that applying these workflows can lead to the execution of tens of thousands of individual homology searches and might produce data in the Terabyte range makes it necessary to additionally focus on issues of high performance and distributed computing.

The ANNOTATOR has provided crucial input in a wide range of projects:
 
1. Maurer-Stroh S, Koranda M, Benetka W, Schneider G, Sirota FL, Eisenhaber F.
Towards complete sets of farnesylated and geranylgeranylated proteins.
PLoS Comput Biol. 2007 Apr 6;3(4):e66. Epub 2007 Feb 23.
PMID: 17411337 [PubMed - indexed for MEDLINE]
 
2. Neuberger G, Schneider G, Eisenhaber F.
pkaPS: prediction of protein kinase A phosphorylation sites with the simplified kinase-substrate binding model.
Biol Direct. 2007 Jan 12;2:1.
PMID: 17222345 [PubMed]
 
3. Novatchkova M, Schneider G, Fritz R, Eisenhaber F, Schleiffer A.
DOUTfinder--identification of distant domain outliers using subsignificant sequence similarity.
Nucleic Acids Res. 2006 Jul 1;34(Web Server issue):W214-8.
PMID: 16844996 [PubMed - indexed for MEDLINE]
 
4. Schneider G, Neuberger G, Wildpaner M, Tian S, Berezovsky I, Eisenhaber F.
Application of a sensitive collection heuristic for very large protein families: evolutionary relationship between adipose triglyceride lipase (ATGL) and classic mammalian lipases.
BMC Bioinformatics. 2006 Mar 21;7:164.
PMID: 16551354 [PubMed - indexed for MEDLINE]
 
5. Maurer-Stroh S, Gouda M, Novatchkova M, Schleiffer A, Schneider G, Sirota FL, Wildpaner M, Hayashi N, Eisenhaber F.
MYRbase: analysis of genome-wide glycine myristoylation enlarges the functional spectrum of eukaryotic myristoylated proteins.
Genome Biol. 2004;5(3):R21. Epub 2004 Feb 13.
PMID: 15003124 [PubMed - indexed for MEDLINE]
 
6. Maurer-Stroh S, Gouda M, Novatchkova M, Schleiffer A, Schneider G, Sirota FL, Wildpaner M, Hayashi N, Eisenhaber F.
MYRbase: analysis of genome-wide glycine myristoylation enlarges the functional spectrum of eukaryotic myristoylated proteins.
Genome Biol. 2004;5(3):R21. Epub 2004 Feb 13.
PMID: 15003124 [PubMed - indexed for MEDLINE]
 
7. Eisenhaber F, Eisenhaber B, Kubina W, Maurer-Stroh S, Neuberger G, Schneider G, Wildpaner M.
Prediction of lipid posttranslational modifications and localization signals from protein sequences: big-Pi, NMT and PTS1.
Nucleic Acids Res. 2003 Jul 1;31(13):3631-4.
PMID: 12824382 [PubMed - indexed for MEDLINE]
 
8. Wick N, Luedemann S, Vietor I, Cotten M, Wildpaner M, Schneider G, Eisenhaber F, Huber LA.
Induction of short interspersed nuclear repeat-containing transcripts in epithelial cells upon infection with a chicken adenovirus.
J Mol Biol. 2003 May 9;328(4):779-90.
PMID: 12729754 [PubMed - indexed for MEDLINE]
 
9. Eisenhaber B, Maurer-Stroh S, Novatchkova M, Schneider G, Eisenhaber F.
Enzymes and auxiliary factors for GPI lipid anchor biosynthesis and post-translational transfer to proteins.
Bioessays. 2003 Apr;25(4):367-85. Review.
PMID: 12655644 [PubMed - indexed for MEDLINE]
 
10. Wildpaner M, Schneider G, Schleiffer A, Eisenhaber F.
Taxonomy workbench.
Bioinformatics. 2001 Dec;17(12):1179-82.
PMID: 11751226 [PubMed - indexed for MEDLINE]
 
 
Feedback Login Site Map